AITopics | information metric

Collaborating Authors

information metric

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Mutual Information Tracks Policy Coherence in Reinforcement Learning

Reid, Cameron, Hafez, Wael, Nazeri, Amirhossein

arXiv.org Artificial IntelligenceSep-15-2025

Reinforcement Learning (RL) agents deployed in real-world environments face degradation from sensor faults, actuator wear, and environmental shifts, yet lack intrinsic mechanisms to detect and diagnose these failures. We present an information-theoretic framework that reveals both the fundamental dynamics of RL and provides practical methods for diagnosing deployment-time anomalies. Through analysis of state-action mutual information patterns in a robotic control task, we first demonstrate that successful learning exhibits characteristic information signatures: mutual information between states and actions steadily increases from 0.84 to 2.83 bits (238% growth) despite growing state entropy, indicating that agents develop increasingly selective attention to task-relevant patterns. Intriguingly, states, actions and next states joint mutual information, MI(S,A;S'), follows an inverted U-curve, peaking during early learning before declining as the agent specializes suggesting a transition from broad exploration to efficient exploitation. More immediately actionable, we show that information metrics can differentially diagnose system failures: observation-space, i.e., states noise (sensor faults) produces broad collapses across all information channels with pronounced drops in state-action coupling, while action-space noise (actuator faults) selectively disrupts action-outcome predictability while preserving state-action relationships. This differential diagnostic capability demonstrated through controlled perturbation experiments enables precise fault localization without architectural modifications or performance degradation. By establishing information patterns as both signatures of learning and diagnostic for system health, we provide the foundation for adaptive RL systems capable of autonomous fault detection and policy adjustment based on information-theoretic principles.

information, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2509.10423

Country: North America > United States (0.14)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.34)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Enhancing Exploration Efficiency using Uncertainty-Aware Information Prediction

Kim, Seunghwan, Shin, Heejung, Yim, Gaeun, Kim, Changseung, Oh, Hyondong

arXiv.org Artificial IntelligenceDec-17-2024

Autonomous exploration is a crucial aspect of robotics, enabling robots to explore unknown environments and generate maps without prior knowledge. This paper proposes a method to enhance exploration efficiency by integrating neural network-based occupancy grid map prediction with uncertainty-aware Bayesian neural network. Uncertainty from neural network-based occupancy grid map prediction is probabilistically integrated into mutual information for exploration. To demonstrate the effectiveness of the proposed method, we conducted comparative simulations within a frontier exploration framework in a realistic simulator environment against various information metrics. The proposed method showed superior performance in terms of exploration efficiency.

artificial intelligence, machine learning, prediction, (18 more...)

arXiv.org Artificial Intelligence

2412.12825

Country:

Asia > South Korea > Ulsan > Ulsan (0.04)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Properties of the Concrete distribution

Chow, David D. K.

arXiv.org Machine LearningNov-2-2022

Like the more well-known Dirichlet distribution, it includes the uniform distribution on the simplex, but is otherwise distinct from the Dirichlet distribution. There is a temperature parameter, named by analogy with the Boltzmann (or Gibbs) distribution in thermodynamics. In the zero temperature limit, samples become concentrated at the corners of the simplex and the distribution approximates a categorical distribution: the limit of a continuous distribution approximates a discrete distribution, hence the portmanteau "Concrete". There are additionally K parameters that are easily interpreted as unnormalized probabilities for the K categories in this limit; a single constraint reduces these to K 1 independent parameters. The Concrete distribution has applications in machine learning for stochastic neural networks with discrete random variables, such as random sampling from discrete distributions and estimation of parameter gradients through backpropagation. One way of constructing the Concrete distribution is from combining Gumbel distributions through the softmax function, hence its alternative name of the Gumbel-softmax distribution. The differentiability of the softmax function, unlike the argmax function, allows for backpropagation.

artificial intelligence, concrete distribution, machine learning, (16 more...)

arXiv.org Machine Learning

2211.01306

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.74)

Add feedback

On the Dynamics of Inference and Learning

Berman, David S., Heckman, Jonathan J., Klinger, Marc

arXiv.org Machine LearningApr-19-2022

Statistical Inference is the process of determining a probability distribution over the space of parameters of a model given a data set. As more data becomes available this probability distribution becomes updated via the application of Bayes' theorem. We present a treatment of this Bayesian updating process as a continuous dynamical system. Statistical inference is then governed by a first order differential equation describing a trajectory or flow in the information geometry determined by a parametric family of models. We solve this equation for some simple models and show that when the Cram\'{e}r-Rao bound is saturated the learning rate is governed by a simple $1/T$ power-law, with $T$ a time-like variable denoting the quantity of data. The presence of hidden variables can be incorporated in this setting, leading to an additional driving term in the resulting flow equation. We illustrate this with both analytic and numerical examples based on Gaussians and Gaussian Random Processes and inference of the coupling constant in the 1D Ising model. Finally we compare the qualitative behaviour exhibited by Bayesian flows to the training of various neural networks on benchmarked data sets such as MNIST and CIFAR10 and show how that for networks exhibiting small final losses the simple power-law is also satisfied.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Machine Learning

2204.12939

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

On Information (pseudo) Metric

Baudot, Pierre

arXiv.org Machine LearningMar-2-2021

This short note revisit information metric, underlining that it is a pseudo metric on manifolds of observables (random variables), rather than as usual on probability laws. Geodesics are characterized in terms of their boundaries and conditional independence condition. Pythagorean theorem is given, providing in special case potentially interesting natural integer triplets. This metric is computed for illustration on Diabetes dataset using infotopo package.

information, information metric, triangle inequality, (13 more...)

arXiv.org Machine Learning

2103.02008

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > France (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area > Endocrinology (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.34)

Add feedback

Information Newton's flow: second-order optimization method in probability space

Wang, Yifei, Li, Wuchen

arXiv.org Machine LearningJan-16-2020

We introduce a framework for Newton's flows in probability space with information metrics, named information Newton's flows. Here two information metrics are considered, including both the Fisher-Rao metric and the Wasserstein-2 metric. Several examples of information Newton's flows for learning objective/loss functions are provided, such as Kullback-Leibler (KL) divergence, Maximum mean discrepancy (MMD), and cross entropy. The asymptotic convergence results of proposed Newton's methods are provided. A known fact is that overdamped Langevin dynamics correspond to Wasserstein gradient flows of KL divergence. Extending this fact to Wasserstein Newton's flows of KL divergence, we derive Newton's Langevin dynamics. We provide examples of Newton's Langevin dynamics in both one-dimensional space and Gaussian families. For the numerical implementation, we design sampling efficient variational methods to approximate Wasserstein Newton's directions. Several numerical examples in Gaussian families and Bayesian logistic regression are shown to demonstrate the effectiveness of the proposed method.

kl divergence, probability space, wasserstein newton, (14 more...)

arXiv.org Machine Learning

2001.04341

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Accelerated Information Gradient flow

Wang, Yifei, Li, Wuchen

arXiv.org Machine LearningSep-4-2019

We present a systematic framework for the Nesterov's accelerated gradient flows in the spaces of probabilities embedded with information metrics. Here two metrics are considered, including both the Fisher-Rao metric and the Wasserstein-$2$ metric. For the Wasserstein-$2$ metric case, we prove the convergence properties of the accelerated gradient flows, and introduce their formulations in Gaussian families. Furthermore, we propose a practical discrete-time algorithm in particle implementations with an adaptive restart technique. We formulate a novel bandwidth selection method, which learns the Wasserstein-$2$ gradient direction from Brownian-motion samples. Experimental results including Bayesian inference show the strength of the current method compared with the state-of-the-art.

gradient flow, survey article, upstream oil & gas, (18 more...)

arXiv.org Machine Learning

1909.02102

Genre: Research Report (0.82)

Industry: Energy > Oil & Gas > Upstream (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)

Add feedback

Deep network as memory space: complexity, generalization, disentangled representation and interpretability

Dong, X., Zhou, L.

arXiv.org Artificial IntelligenceJul-12-2019

By bridging deep networks and physics, the programme of geometrization of deep networks was proposed as a framework for the interpretability of deep learning systems. Following this programme we can apply two key ideas of physics, the geometrization of physics and the least action principle, on deep networks and deliver a new picture of deep networks: deep networks as memory space of information, where the capacity, robustness and efficiency of the memory are closely related with the complexity, generalization and disentanglement of deep networks. The key components of this understanding include:(1) a Fisher metric based formulation of the network complexity; (2)the least action (complexity=action) principle on deep networks and (3)the geometry built on deep network configurations. We will show how this picture will bring us a new understanding of the interpretability of deep learning systems.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

1907.06572

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Texas > Kleberg County (0.04)
North America > United States > Texas > Chambers County (0.04)
(2 more...)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Metric Gaussian Variational Inference

Knollmüller, Jakob, Enßlin, Torsten A.

arXiv.org Machine LearningJan-30-2019

A variational Gaussian approximation of the posterior distribution can be an excellent way to infer posterior quantities. However, to capture all posterior correlations the parametrization of the full covariance is required, which scales quadratic with the problem size. This scaling prohibits full-covariance approximations for large-scale problems. As a solution to this limitation we propose Metric Gaussian Variational Inference (MGVI). This procedure approximates the variational covariance such that it requires no parameters on its own and still provides reliable posterior correlations and uncertainties for all model parameters. We approximate the variational covariance with the inverse Fisher metric, a local estimate of the true posterior uncertainty. This covariance is only stored implicitly and all necessary quantities can be extracted from it by independent samples drawn from the approximating Gaussian. MGVI requires the minimization of a stochastic estimate of the Kullback-Leibler divergence only with respect to the mean of the variational Gaussian, a quantity that only scales linearly with the problem size. We motivate the choice of this covariance from an information geometric perspective. The method is validated against established approaches in a small example and the scaling is demonstrated in a problem with over a million parameters.

approximation, covariance, inference, (11 more...)

arXiv.org Machine Learning

1901.11033

Country: Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)

Add feedback

Inferring relevant features: from QFT to PCA

Bény, Cédric

arXiv.org Machine LearningFeb-16-2018

In many-body physics, renormalization techniques are used to extract aspects of a statistical or quantum state that are relevant at large scale, or for low energy experiments. Recent works have proposed that these features can be formally identified as those perturbations of the states whose distinguishability most resist coarse-graining. Here, we examine whether this same strategy can be used to identify important features of an unlabeled dataset. This approach indeed results in a technique very similar to kernel PCA (principal component analysis), but with a kernel function that is automatically adapted to the data, or "learned". We test this approach on handwritten digits, and find that the most relevant features are significantly better for classification than those obtained from a simple gaussian kernel.

artificial intelligence, kernel, machine learning, (15 more...)

arXiv.org Machine Learning

1802.05756

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)

Add feedback